Introduction

This notebook was created for analysis and prediction making of the Default of credit card clients Data Set from UCI Machine Learning Library. The data set can be accessed separately from the UCI Machine Learning Repository page, here.

Relevant Papers

In their paper "The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. (Yeh I. C. & Lien C. H.,2009)", which can be found here, Yeh I. C. & Lien C. H. review six data mining techniques (discriminant analysis, logistic regression, Bayesclassifier, nearest neighbor, artificial neural networks, and classification trees) and their applications on credit scoring. Then, using the real cardholders’ credit risk data in Tai-wan, they compare the classification accuracy among them.

In another paper titled "Machine Learning Approaches to Predict Default of Credit Card Clients. (Liu, R.L. (2018))", which can be found here, Liu, R.L. compares traditional machine learning models, i.e. Support Vector Machine, k-Nearest Neighbors, Decision Tree and Random Forest, with Feedforward Neural Network and Long Short-Term Memory.

Attribute Information

Below there are the description of the attributes that will be used in our model for better understanding of the data:

  • LIMIT_BAL: Amount of the given credit (NT dollar). It includes both the individual consumer credit and his/her family (supplementary) credit.
  • SEX: Gender (1 = male; 2 = female).
  • EDUCATION: Education (1 = graduate school; 2 = university; 3 = high school; 4 = others).
  • MARRIAGE: Marital status (1 = married; 2 = single; 3 = others).
  • AGE: Age (year).
  • PAY_1: the repayment status in September, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • PAY_2: the repayment status in August, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • PAY_3: the repayment status in July, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • PAY_4: the repayment status in June, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • PAY_5: the repayment status in May, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • PAY_6: the repayment status in April, 2005. The measurement scale for the repayment status is: -1 = pay duly; 1 = payment delay for one month; 2 = payment delay for two months; . . .; 8 = payment delay for eight months; 9 = payment delay for nine months and above.
  • BILL_AMT1: Amount of bill statement (NT dollar). Amount of bill statement in September, 2005.
  • BILL_AMT2: Amount of bill statement (NT dollar). Amount of bill statement in August, 2005.
  • BILL_AMT3: Amount of bill statement (NT dollar). Amount of bill statement in July, 2005.
  • BILL_AMT4: Amount of bill statement (NT dollar). Amount of bill statement in June, 2005.
  • BILL_AMT5: Amount of bill statement (NT dollar). Amount of bill statement in May, 2005.
  • BILL_AMT6: Amount of bill statement (NT dollar). Amount of bill statement in April, 2005.
  • PAY_AMT1: Amount of previous payment (NT dollar). Amount paid in September, 2005.
  • PAY_AMT2: Amount of previous payment (NT dollar). Amount paid in August, 2005.
  • PAY_AMT3: Amount of previous payment (NT dollar). Amount paid in July, 2005.
  • PAY_AMT4: Amount of previous payment (NT dollar). Amount paid in June, 2005.
  • PAY_AMT5: Amount of previous payment (NT dollar). Amount paid in May, 2005.
  • PAY_AMT6: Amount of previous payment (NT dollar). Amount paid in June, 2005.
  • dpnm: Default payment next month.(Yes = 1, No = 0)

Models

We will create 3 models in order to make predictions and compare them with the original paper. These models are:

  • Logistic Regression
  • Decision tree
  • Neural Network

Metrics

In order to be consistent with the original paper and have the same base for our results, we will use the same metrics. These metrics are: Accuracy and F1 score. Accuracy is $\frac{Number of correct predictions}{Number of samples}$. When the dataset is imbalanced, accuracy may not be sufficient, because simply predicting all samples to be the major class can still get high accuracy. In such situation, a good metrics to use is f1 score. F1-score is calculated by $\frac{2*precision*recall}{precision+recall}$, where precision is $\frac{True Positives}{True Positives+False Positives}$ and recall is $\frac{True Positives}{True Positives+False Negatives}$. Precision measures a model’s ability to correctly identify positive samples and recall measures the proportion of positive samples that are identified. F1-score ranges from 0 (cannot make true positive prediction) to 1 (being correct in all predictions).

In addition to the above, we compute confusion matrix for each model as well the Area Under the Curve (AUC) and plot the ROC curves. A typical ROC curve has False Positive Rate (FPR) on the X-axis and True Positive Rate (TPR) on the Y-axis. The area covered by the curve is the area between the line (ROC) and the axis. This area covered is AUC. The bigger the area covered, the better the machine learning models is at distinguishing the given classes. Ideal value for AUC is 1.

Goal

Using the models we created, we will try to predict the class value of dpnm column with better scores (accuracy and f1) than the scores presented in the two papers.

Import libraries/packages

In [1]:
### General libraries ###
import pandas as pd
from pandas.api.types import CategoricalDtype
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import seaborn as sns
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import graphviz 
from graphviz import Source
from IPython.display import SVG
import os

##################################

### ML Models ###
from sklearn.linear_model import LogisticRegression
from sklearn import tree
from sklearn.tree.export import export_text
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import train_test_split, cross_val_score
from sklearn.preprocessing import StandardScaler

##################################

### Metrics ###
from yellowbrick.classifier import ConfusionMatrix
from sklearn import metrics
from sklearn.metrics import f1_score,confusion_matrix, mean_squared_error, mean_absolute_error, classification_report, roc_auc_score, roc_curve, precision_score, recall_score

Part 1: Load and clean the data

In this section we will load the data from the csv file and check for any "impurities", such as null values or duplicate rows. If any of these will appear, we will remove them from the data set. We will also plot the correlations of the class column with all the other columns.

In [2]:
# Load the data.
data = pd.read_csv('default of credit card clients.csv')

# Information
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 30000 entries, 0 to 29999
Data columns (total 25 columns):
ID           30000 non-null int64
LIMIT_BAL    30000 non-null int64
SEX          30000 non-null int64
EDUCATION    30000 non-null int64
MARRIAGE     30000 non-null int64
AGE          30000 non-null int64
PAY_1        30000 non-null int64
PAY_2        30000 non-null int64
PAY_3        30000 non-null int64
PAY_4        30000 non-null int64
PAY_5        30000 non-null int64
PAY_6        30000 non-null int64
BILL_AMT1    30000 non-null int64
BILL_AMT2    30000 non-null int64
BILL_AMT3    30000 non-null int64
BILL_AMT4    30000 non-null int64
BILL_AMT5    30000 non-null int64
BILL_AMT6    30000 non-null int64
PAY_AMT1     30000 non-null int64
PAY_AMT2     30000 non-null int64
PAY_AMT3     30000 non-null int64
PAY_AMT4     30000 non-null int64
PAY_AMT5     30000 non-null int64
PAY_AMT6     30000 non-null int64
dpnm         30000 non-null int64
dtypes: int64(25)
memory usage: 5.7 MB

Since the ID column is for indexing purposes only, we remove it from the data set.

In [3]:
# Drop "ID" column.
data = data.drop(['ID'], axis=1)
In [4]:
# Check for null values.
print(
    f"There are {data.isna().any().sum()} cells with null values in the data set.")
There are 0 cells with null values in the data set.

Below is the plot of the correlation matrix for the data set.

In [5]:
# Plot of the correlation matrix for the data set
plt.figure(figsize=(20, 20))
sns.heatmap(data.corr(), annot=True, cmap='rainbow',
            cbar=False, linewidth=0.5, fmt='.2f')
plt.title('Correlation Matrix')
Out[5]:
Text(0.5, 1, 'Correlation Matrix')

On the correlation matrix we can see that the columns BILL_AMT2,BILL_AMT3,BILL_AMT4,BILL_AMT5 and BILL_AMT6 are highly correlated (>0.90) with BILL_AMT1. Because of that we can exclude the from our models and we keep only the BILL_AMT1, as we will see later.

Part 2: Pre-processing

In this part we prepare our data for our models. This means that we choose the columns that will be our independed variables and which column the class that we want to predict. Once we are done with that, we split our data into train and test sets and perfom a standardization upon them.

In [6]:
# Check for duplicate rows.
print(
    f"There are {data[data.columns[:-1]].duplicated().sum()} duplicate rows in the data set.")

# Remove duplicate rows.
data = data.drop_duplicates()
print("The duplicate rows were removed.")
There are 56 duplicate rows in the data set.
The duplicate rows were removed.
In [7]:
# Perform One Hot encoding on 'PAY_1', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6'.
data = pd.get_dummies(
    data, columns=['PAY_1', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6'])
In [8]:
# Distinguish attribute columns and class column.
# BILL_AMT2, BILL_AMT3, BILL_AMT4, BILL_AMT5 and BILL_AMT6 are excluded.
features = ['LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'BILL_AMT1', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6', 'PAY_1_-2', 'PAY_1_-1', 'PAY_1_0',
            'PAY_1_1', 'PAY_1_2', 'PAY_1_3', 'PAY_1_4', 'PAY_1_5', 'PAY_1_6', 'PAY_1_7', 'PAY_1_8', 'PAY_2_-2', 'PAY_2_-1', 'PAY_2_0',
            'PAY_2_1', 'PAY_2_2', 'PAY_2_3', 'PAY_2_4', 'PAY_2_5', 'PAY_2_6', 'PAY_2_7', 'PAY_2_8', 'PAY_3_-2', 'PAY_3_-1', 'PAY_3_0',
            'PAY_3_1', 'PAY_3_2', 'PAY_3_3', 'PAY_3_4', 'PAY_3_5', 'PAY_3_6', 'PAY_3_7', 'PAY_3_8', 'PAY_4_-2', 'PAY_4_-1', 'PAY_4_0',
            'PAY_4_1', 'PAY_4_2', 'PAY_4_3', 'PAY_4_4', 'PAY_4_5', 'PAY_4_6', 'PAY_4_7', 'PAY_4_8', 'PAY_5_-2', 'PAY_5_-1', 'PAY_5_0',
            'PAY_5_2', 'PAY_5_3', 'PAY_5_4', 'PAY_5_5', 'PAY_5_6', 'PAY_5_7', 'PAY_5_8', 'PAY_6_-2', 'PAY_6_-1', 'PAY_6_0', 'PAY_6_2',
            'PAY_6_3', 'PAY_6_4', 'PAY_6_5', 'PAY_6_6', 'PAY_6_7', 'PAY_6_8']

X = data[features]
y = data['dpnm']
In [9]:
# Split to train and test sets. 
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=251)
In [10]:
# Standardization
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

Part 3: Modeling

In this section we build and try 3 models:

  • Logistic Regression
  • Decision tree
  • Neural network

Each model will be trained and make a prediction for the test set. Accuracy, f1 score, confusion matrix and ROC AUC will be calculated for each model.

Logistic Regression

In [11]:
# Initialize a Logistic Regression estimator.
logreg = LogisticRegression(multi_class='auto', random_state=25, n_jobs=-1)

# Train the estimator.
logreg.fit(X_train, y_train)
Out[11]:
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='auto', n_jobs=-1, penalty='l2', random_state=25,
                   solver='warn', tol=0.0001, verbose=0, warm_start=False)
In [12]:
# Make predictions.
log_pred = logreg.predict(X_test)

# CV score
logreg_cv = cross_val_score(logreg, X, y, cv=10)

Metrics for Logistic Regression

In [13]:
# Accuracy: 1 is perfect prediction.
print('Accuracy: %.3f' % logreg.score(X_test, y_test))

# Cross-Validation accuracy
print('Cross-validation accuracy: %0.3f' % logreg_cv.mean())

# Precision
print('Precision: %.3f' % precision_score(y_test, log_pred))

# Recall
print('Recall: %.3f' % recall_score(y_test, log_pred))

# f1 score: best value at 1 (perfect precision and recall) and worst at 0.
print('F1 score: %.3f' % f1_score(y_test, log_pred))
Accuracy: 0.821
Cross-validation accuracy: 0.779
Precision: 0.673
Recall: 0.346
F1 score: 0.457
In [14]:
# Predict probabilities for the test data.
logreg_probs = logreg.predict_proba(X_test)

# Keep Probabilities of the positive class only.
logreg_probs = logreg_probs[:, 1]

# Compute the AUC Score.
auc_logreg = roc_auc_score(y_test, logreg_probs)
print('AUC: %.2f' % auc_logreg)
AUC: 0.77

Confusion matrix for Logistic Regression

In [15]:
# Plot confusion matrix for Decision tree.
cm = ConfusionMatrix(logreg, is_fitted=True)
cm.score(X_test, y_test)
cm.poof()
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ac3960bb88>
In [16]:
# Get the ROC curves.
logreg_fpr, logreg_tpr, logreg_thresholds = roc_curve(y_test, logreg_probs)

# Plot the ROC curve.
plt.figure(figsize=(8, 8))
plt.plot(logreg_fpr, logreg_tpr, color='red',
         label='Logistic Regression ROC (AUC= %0.2f)' % auc_logreg)
plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--', label='random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curves')
plt.legend()
plt.show()

Decision tree

In [17]:
# Initialize a decision tree estimator.
tr = tree.DecisionTreeClassifier(
    max_depth=3, criterion='gini', random_state=25)

# Train the estimator.
tr.fit(X_train, y_train)
Out[17]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
                       max_features=None, max_leaf_nodes=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, presort=False,
                       random_state=25, splitter='best')
In [18]:
# Plot the tree.
fig = plt.figure(figsize=(23, 15))
tree.plot_tree(tr.fit(X_train, y_train), feature_names=X.columns,
               filled=True, rounded=True, fontsize=16)
plt.title('Decision Tree')
Out[18]:
Text(0.5, 1.0, 'Decision Tree')
In [19]:
# Print the tree in a simplified version.
r = export_text(tr, feature_names=X.columns.tolist())
print(r)
|--- PAY_1_2 <= 1.43
|   |--- PAY_2_2 <= 1.10
|   |   |--- PAY_AMT2 <= -0.18
|   |   |   |--- class: 0
|   |   |--- PAY_AMT2 >  -0.18
|   |   |   |--- class: 0
|   |--- PAY_2_2 >  1.10
|   |   |--- PAY_1_3 <= 4.62
|   |   |   |--- class: 0
|   |   |--- PAY_1_3 >  4.62
|   |   |   |--- class: 1
|--- PAY_1_2 >  1.43
|   |--- PAY_3_-1 <= 0.77
|   |   |--- PAY_5_2 <= 1.47
|   |   |   |--- class: 1
|   |   |--- PAY_5_2 >  1.47
|   |   |   |--- class: 1
|   |--- PAY_3_-1 >  0.77
|   |   |--- BILL_AMT1 <= -0.66
|   |   |   |--- class: 0
|   |   |--- BILL_AMT1 >  -0.66
|   |   |   |--- class: 1

In [20]:
# Make predictions.
tr_pred = tr.predict(X_test)

# CV score
tr_cv = cross_val_score(tr, X, y, cv=10)

Metrics for Decision tree

In [21]:
# Accuracy: 1 is perfect prediction.
print('Accuracy: %.3f' % tr.score(X_test, y_test))

# Cross-Validation accuracy
print('Cross-validation accuracy: %0.3f' % tr_cv.mean())

# Precision
print('Precision: %.3f' % precision_score(y_test, tr_pred))

# Recall
print('Precision: %.3f' % recall_score(y_test, tr_pred))

# f1 score: best value at 1 (perfect precision and recall) and worst at 0.
print('F1 score: %.3f' % f1_score(y_test, tr_pred))
Accuracy: 0.821
Cross-validation accuracy: 0.818
Precision: 0.713
Precision: 0.298
F1 score: 0.420
In [22]:
# Predict propabilities for the test data.
tr_probs = tr.predict_proba(X_test)

# Keep Probabilities of the positive class only.
tr_probs = tr_probs[:, 1]

# Compute the AUC Score.
auc_tr = roc_auc_score(y_test, tr_probs)
print('AUC: %.2f' % auc_tr)
AUC: 0.73

Confusion Matrix for Decision tree

In [23]:
# Plot confusion matrix for Decision tree.
cm = ConfusionMatrix(tr, is_fitted=True)
cm.score(X_test, y_test)
cm.poof()
Out[23]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ac3850b288>
In [24]:
# Get the ROC curves.
tr_fpr, tr_tpr, tr_thresholds = roc_curve(y_test, tr_probs)

# Plot the ROC curve.
plt.figure(figsize=(8, 8))
plt.plot(tr_fpr, tr_tpr, color='red',
         label='Decision tree ROC (AUC= %0.2f)' % auc_tr)
plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--', label='random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curves')
plt.legend()
plt.show()

Neural network

In [25]:
# Initialize a Multi-layer Perceptron classifier.
mlp = MLPClassifier(hidden_layer_sizes=(32, 32), max_iter=1000, activation='logistic',
                    alpha=0.01, random_state=25, shuffle=True, verbose=False)

# Train the classifier.
mlp.fit(X_train, y_train)
Out[25]:
MLPClassifier(activation='logistic', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=False, epsilon=1e-08,
              hidden_layer_sizes=(32, 32), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=25, shuffle=True, solver='adam', tol=0.0001,
              validation_fraction=0.1, verbose=False, warm_start=False)
In [26]:
# Make predictions.
mlp_pred = mlp.predict(X_test)

# CV score
mlp_cv = cross_val_score(mlp, X, y, cv=10)

Metrics for Neural Network

In [27]:
# Accuracy: 1 is perfect prediction.
print('Accuracy: %.3f' % mlp.score(X_test, y_test))

# Cross-Validation accuracy
print('Cross-validation accuracy: %0.3f' % mlp_cv.mean())

# Precision
print('Precision: %.3f' % precision_score(y_test, mlp_pred))

# Recall
print('Recall: %.3f' % recall_score(y_test, mlp_pred))

# f1 score: best value at 1 (perfect precision and recall) and worst at 0.
print('F1 score: %.3f' % f1_score(y_test, mlp_pred))
Accuracy: 0.811
Cross-validation accuracy: 0.779
Precision: 0.606
Recall: 0.380
F1 score: 0.467
In [28]:
# Predict probabilities for the test data.
mlp_probs = mlp.predict_proba(X_test)

# Keep probabilities of the positive class only.
mlp_probs = mlp_probs[:, 1]

# Compute the AUC Score.
auc_mlp = roc_auc_score(y_test, mlp_probs)
print('AUC: %.2f' % auc_mlp)
AUC: 0.75

Confusion Matrix for Neural Network

In [29]:
# Plot confusion matrix for Decision tree.
cm = ConfusionMatrix(mlp, is_fitted=True)
cm.score(X_test, y_test)
cm.poof()
Out[29]:
<matplotlib.axes._subplots.AxesSubplot at 0x1ac385b17c8>
In [30]:
# Get the ROC curves.
mlp_fpr, mlp_tpr, mlp_thresholds = roc_curve(y_test, mlp_probs)

# Plot the ROC curve.
plt.figure(figsize=(8, 8))
plt.plot(mlp_fpr, mlp_tpr, color='red', label='MLP ROC (AUC= %0.2f)' % auc_mlp)
plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--', label='random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curves')
plt.legend()
plt.show()

Summary

In [31]:
metrics = ['Accuracy', 'CV accuracy', 'Precision', 'Recall', 'F1', 'ROC AUC']

# Plot metrics.
fig = go.Figure(data=[
    go.Bar(name='Logistic Regression', x=metrics,
           y=[logreg.score(X_test, y_test), logreg_cv.mean(), precision_score(y_test, log_pred), recall_score(y_test, log_pred), f1_score(y_test, log_pred), auc_logreg]),
    go.Bar(name='Decision tree', x=metrics,
           y=[tr.score(X_test, y_test), tr_cv.mean(), precision_score(y_test, tr_pred), recall_score(y_test, tr_pred), f1_score(y_test, tr_pred), auc_tr]),
    go.Bar(name='Neural Network', x=metrics,
           y=[mlp.score(X_test, y_test), mlp_cv.mean(), precision_score(y_test, mlp_pred), recall_score(y_test, mlp_pred), f1_score(y_test, mlp_pred), auc_mlp])

])

fig.update_layout(title_text='Metrics for each model',
                  barmode='group', xaxis_tickangle=-45, bargroupgap=0.05)
fig.show()
In [32]:
# Plot the ROC curve.
plt.figure(figsize=(8, 8))
plt.plot(mlp_fpr, mlp_tpr, color='green',
         label='MLP ROC (AUC= %0.2f)' % auc_mlp)
plt.plot(tr_fpr, tr_tpr, color='orange',
         label='Decision tree ROC (AUC= %0.2f)' % auc_tr)
plt.plot(logreg_fpr, logreg_tpr, color='red',
         label='LogReg ROC (AUC= %0.2f)' % auc_logreg)
plt.plot([0, 1], [0, 1], color='darkblue', linestyle='--', label='random')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curves for GridSearch')
plt.legend()
plt.show()

Results

In [33]:
d = {
    '': ['Logistic Regression', 'Decision Tree', 'Neural Network (MLP)'],
    'Accuracy': [logreg.score(X_test, y_test), tr.score(X_test, y_test), mlp.score(X_test, y_test)],
    'CV Accuracy': [logreg_cv.mean(), tr_cv.mean(), mlp_cv.mean()],
    'Precision': [precision_score(y_test, log_pred), precision_score(y_test, tr_pred), precision_score(y_test, mlp_pred)],
    'Recall': [recall_score(y_test, log_pred), recall_score(y_test, tr_pred), recall_score(y_test, mlp_pred)],
    'F1': [f1_score(y_test, log_pred), f1_score(y_test, tr_pred), f1_score(y_test, mlp_pred)],
    'ROC AUC': [auc_logreg,  auc_tr, auc_mlp]
}

results = pd.DataFrame(data=d).round(4).set_index('')
results
Out[33]:
Accuracy CV Accuracy Precision Recall F1 ROC AUC
Logistic Regression 0.8209 0.7787 0.6733 0.3461 0.4572 0.7656
Decision Tree 0.8208 0.8177 0.7127 0.2976 0.4199 0.7269
Neural Network (MLP) 0.8111 0.7787 0.6064 0.3798 0.4670 0.7510

Paper results

1) The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Systems with Applications 36 (2009) 2473–2480

Error rate Accuracy
Logistic Regression 0.18 0.82
Decision tree 0.17 0.83
Neural Network 0.17 0.83

2) Liu, R.L. (2018) Machine Learning Approaches to Predict Default of Credit Card Clients. Modern Economy, 9, 1828-1838.

Accuracy F1
Decision tree 0.7973 0.4912
Neural Network 0.8227 0.4593